arXiv: 1503.05434v 1 [cs.IT] 17 Mar 2015 


MARCH 2015 


1 


Compressed Differential Erasure Codes 
for Efficient Archival of Versioned Data 

J. Harshan, Anwitaman Datta, Frederique Oggier 


Abstract —In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner in distributed 
storage systems. We propose a new storage technique called differential erasure coding (DEC) where the differences (deltas) between 
subsequent versions are stored rather than the whole objects, akin to a typical delta encoding technique. However, unlike delta encoding 
techniques, DEC opportunistically exploits the sparsity (i.e., when the differences between two successive versions have few non-zero 
entries) in the updates to store the deltas using compressed sensing techniques applied with erasure coding. We first show that 
DEC provides significant savings in the storage size for versioned data whenever the update patterns are characterized by in-place 
alterations. Subsequently, we propose a practical DEC framework so as to reap storage size benefits against not just in-place alterations 
but also real-world update patterns such as insertions and deletions that alter the overall data sizes. We conduct experiments with 
several synthetic workloads to demonstrate that the practical variant of DEC provides significant reductions in storage overhead (up 
to 60% depending on the workload) compared to baseline storage system which incorporates concepts from Rsync, a delta encoding 
technique to store and synchronize data across a network. 

Index Terms —Datacenter networking, fault tolerance, erasure coding, version management, compressed sensing 
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1 Introduction 

Distributed storage systems enable the storage of huge 
amount of data across networks of storage nodes. In 
such systems, redundancy of the stored data is critical 
to ensure fault tolerance against node failures. While 
data replication remains a practical way of realizing this 
redundancy, the past years have witnessed the adoption 
of erasure codes for data archival (e.g. in Microsoft 
Azure fl], Hadoop FS 0, or Google File System 0), 
which offer a better trade-off between storage overhead 
and fault tolerance. Design of erasure coding techniques 
amenable to reliable and efficient storage has accordingly 
garnered a huge attention 0, 0. 

Recent research works in distributed storage sys¬ 
tems have predominantly focused on efficient storage 
of stand-alone data objects. Not many have addressed 
the aspect of efficiently storing multiple versions of 
data. Recently, Wang and Cadambe 0 have addressed 
multi-version coding for distributed data, where the 
underlying problem is to encode different versions so 
that certain subsets of storage nodes can be accessed to 
retrieve the most common version among them. Their 
strategy has been shown applicable when the updates 
for the latest version do not reach all the nodes, possibly 
due to network problems. In this paper, we investigate 
a new aspect of erasure code design, aimed at storing 
multiple versions of data. The presented work is loosely 
related to the issues of efficient updates 0, 0, 0, IfTOI , 
and of deduplication (TQ. Existing works on update of 
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erasure coded data focus on the computational and com¬ 
munication efficiency in carrying out the updates, with 
the goal to store only the latest version of the data, and 
thus do not delve into efficient storage or manipulation 
of the previous versions. Deduplication is the process 
of eliminating duplicate data blocks, which is used in 
order to eliminate unnecessary redundancy. Though we 
do not directly address it, the update semantic we 
use, that of storing only the differences across versions 
(akin to SVN 1321), means unnecessary duplicates are 
not created while storing the many versions, while our 
coding technique itself focuses on reducing the storage 
and I/O overheads of storing reliably the differences 
across versions. 

The need to store multiple versions of data arise in 
many scenarios. For instance, when editing and updat¬ 
ing files, users may want to explicitly create a version 
repository using a framework like SVN tl2l or Git 
im Cloud based document editing or storage services 
also often provide the users access to older versions 
of the documents. Another scenario is that of system 
level back-up, where directories, whole file systems or 
databases are archived - and versions refer to the dif¬ 
ferent system snapshots. In either of the two file centric 
settings, irrespective of whether a working copy used 
during editing is stored locally or on the cloud, or in a 
system level back up, say using copy-on-write [14]|, the 
back-end storage system needs to preserve the different 
versions reliably, and can leverage on erasure coding for 
reducing the storage overheads. 

We propose a new differential erasure coding (DEC) 
framework that falls under the umbrella of delta encod¬ 
ing techniques, where the differences (deltas) between 
subsequent versions are stored rather than the whole 





MARCH 2015 


2 


objects. The proposed technique exploits the sparsity in 
the differences among versions by applying techniques 
from compressed sensing [l7|, in order to reduce the 
storage overhead (see Sections [2] and [3}. We have already 
proposed the idea of combining compressed sensing 
with erasure coding in [15), where we studied the ben¬ 
efits purely in terms of I/O, and for objects of fixed 
size. While retaining the combination of compressed 
sensing and erasure coding, this work introduces a 
different erasure coding strategy which provides storage 
overhead benefits. We first present a simplistic layout of 
DEC that relies on fixed object lengths across successive 
versions of the data, so as to evaluate the right choice 
of erasure codes to store versioned data. We show that 
when all the versions are fetched in ensemble, there is 
also an equivalent gain in I/O operations. This comes at 
an increased I/O overhead when accessing individual 
versions. We accordingly propose some heuristics to 
optimize the basic DEC, and demonstrate that they ame¬ 
liorate the drawbacks adequately without compromising 
the gains at all. Further, we show that the combination 
of compressed sensing and erasure coding yields other 
practical benefits such as the possibility of employing 
fewer erasure codes against different sparsity levels of 
the update patterns (see Section [4]). 

In the later part of this paper, we extend the prelimi¬ 
nary ideas of DEC to develop a framework for practical 
DEC that is robust to real-world update patterns across 
versions such as insertions and deletions which may 
alter the overall size of the data object. Along that direc¬ 
tion, we acknowledge that insertions and deletions may 
ripple changes across the object at the coding granularity, 
and may also increase the object size. Such rippling effect 
could in particular render DEC useless, and obliterate 
the consequent benefits. To circumvent such hurdles, 
we apply zero padding schemes, an obvious way to 
ameliorate the aforementioned problems, taking into 
account insertions, deletions and in-place alterations (see 
Section [6]). For storing versioned data, the total storage 
size for deltas inclusive of zero pads (prior to erasure 
coding) is used as the metric to evaluate the quantum 
and placement of zero pads against a wide range of 
workloads that include insertions and deletions, both 
bursty and distributed in nature (see Section 0. As a 
baseline, we choose an intuitive setup that uses concepts 
from Rsync (16)/ a delta encoding technique to store and 
synchronize files over the network, to store successive 
versions of the object. We compare the storage savings 
offered by the practical DEC technique with the baseline 
and show that the savings from DEC can soar as high 
as 60% depending on the distribution of the workload. 

1.1 System Model for Version Management 

Any digital content to be stored, be it a file, directory, 
database, or a whole file system, is divided into data 
chunks, shown as phase (I) in Fig. [l] The proposed 
coding techniques are agnostic of the nuances of the 


, ©modification 

I ■ ■ ■ digital content "—■ j* |j 

create fixed sized chunks 
(we refer these as dataobjeots^ 


© compress 
the difference 



Fig. 1. An overview of the coding strategy using com¬ 
pressed differences 

upper levels, and all subsequent discussions will be at 
the granularity of these chunks, which we will refer to 
as data objects or just objects. 

Formally, we denote by x e Fj a data object to be 
stored over a network, that is, the object is seen as a 
vector of k blocks (phase (2)) taking value in the alphabet 
F q , with F q the finite field with q elements, q a power of 
2 typically. Encoding for archival of an object x across 
n nodes is done (phase (3)) using an (n, k) linear code, 
that is x is mapped to the codeword 

c = GxgFJ, n> k, (1) 

for G an nxk generator matrix with coefficients in F q . We 
use the term systematic to refer to a codeword c whose k 
first components are x, that is q = x if i = 1,..., k. This 
described what is a standard encoding procedure used in 
erasure coding based storage systems. We suppose next 
that the content mutates, and we wish to store all the 
versions. 

Let xi eFJ denote the first version of a data object to 
be stored. When it is modified (phase (4)), a new version 
X 2 G Fj of this object is created. More generally, a new 
version x J+1 is obtained from x 7 to produce over time 
a sequence {x^ gFJ, j = 1, 2, ..., L < oo} of L different 
versions of a data object, to be stored in the network. We 
are not concerned with the application level semantic 
of the modifications, but with the bit level changes in 
the object. Thus the changes between two successive 
versions are captured by the relation 

x j+1 = xj +z i+;L , (2) 

where z J+i gFJ denotes the modifications (in phase (5)) 
of the j th update. We first assume fixed object lengths 
across successive versions of data so as to build an 
uncomplicated framework for the differential strategy. 
Such a framework shields us from unnecessarily delving 
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into system specificities, instead, serves as a foundation 
to evaluate various erasure coding techniques to store 
multiple versions of data. We show that the design, 
analysis and assessment of the coding techniques are 
oblivious to the nuances of how the data object is broken 
down into several chunks prior to the encoding pur¬ 
poses, thereby facilitating us to segregate chunk synthesis 
and erasure coding blocks as two independent entities. In 
the later part of this work (see Section [6]), we discuss 
how to relax the fixed object length assumption and 
yet develop a practical DEC scheme, that is robust to 
variable object lengths across successive versions. 

The key idea is that when the changes from Xj to x J+ i 
are small (decided by the sparsity of z 7+1 ), it is possible 
to apply compressed sensing [ 17 ]|, which permits to 
represent a k- length 7-sparse vector z (see Definition 
[lj with less than k components (phase ©) through a 
linear transformation on z, which does not depend on 
the position of the non-zero entries, in order to gain in 
storage efficiency. 

Definition 1: For some integer 1 < 7 < k, a vector 
z e Fj is said to be 7-sparse if it contains at most 7 
non-zero entries. 

Let z e Fj be 7-sparse such that 7 < |, and 4 > e ¥^ xk 
denote the measurement matrix used for compressed sens¬ 
ing. The compressed representation z' e Fp of z is 
obtained as 

zl = 4 >z. ( 3 ) 

The following proposition fl8l gives a sufficient condi¬ 
tion on <f> to uniquely recover z from z' using a syndrome 
decoder. 

Proposition 1: If any 27 columns of <f> are linearly 
independent, the 7-sparse vector z can be recovered from 
z'. 

Once sparse modifications are compressed, which re¬ 
duces the I/O reads, they are encoded into codewords 
of length < n (phase ( 7 )) decreasing in turn the storage 
overhead. 

2 Differential Erasure Encoding for 
Version-control 

Let {xj eFj, 1 < j < L} be the sequence of versions of 
a data object to be stored. The changes from xj to x 7+ i 
are reflected in the vector z j+1 = x j+ i - Xj in 0 which 
is 7 J+ i-sparse (see Definition [lj for some 1 < 7^+1 < k. 
The value 7^44 may a priori vary across versions of one 
object, and across application domains. All the versions 
xi,... ,xl need protection from node failures, and are 
archived using a linear erasure code (see 0)- 

2.1 Object Encoding 

We describe a generic differential encoding (called Step 
j+ 1) suited for efficient archival of versioned data, which 
exploits the sparsity of the updates, when 7^+1 < |, 
to reduce the storage overheads of archiving all the 
versions reliably. We assume that one storage node is 


in possession of two versions, say Xj and x 7+ i of one 
data object, j = 1 ,..., L — 1. The corresponding imple¬ 
mentation is discussed in Subsection 12.21 

Step j + 1 . Two versions x 7 and x J+i are located in one 
storage node. The difference vector z^+i = x 7+ i — x 7 and 
the corresponding sparsity level 7^+1 are computed. If 
7j+i > §, the object z j +1 is encoded as c J+ i = Gz j +1 . On 
the other hand, if 7j+i < §, then z J+ i is first compressed 
(see 0) as 

z i +1 = ^7j+i z j+l> 

where <I> 7j+1 E F^ 7j+lXfe is a measurement matrix such 
that any 27^+1 of its columns are linearly independent 
(see Proposition [lj). Subsequently, z' +1 is encoded as 

c i +1 = ®7j+i z j+i» 

where G 7i+1 e Fg 7j+1 x27j+1 i s the generator matrix of an 
(n 7 . +1 ,27^+1) erasure code with storage overhead k. The 
components of c J+ i are distributed across a set A/} + i of 
n 7i+1 nodes, whose choice is discussed in Section [Z 2 | 
Since 7^44 is random, a total of |~|~| erasure codes 
denoted by Q = {G, Gi,..., Gp|-|_ 1 }, and a total 
of [|] — 1 measurement matrices denoted by E = 
{$1, 3>2 j • • • > have to be designed a priori. The 

erasure codes may be taken systematic and/or MDS 
(that is, such that any n—k failure patterns are tolerated), 
our scheme works irrespective of these choices. This en¬ 
coding strategy implies one extra matrix multiplication 
whenever a sparse difference vector is obtained. 

We give a toy example to illustrate the computations. 


Example 1: Take k = 4 , suppose that the digital content 
is written in binary as (100110010010) and that the linear 
code used for storage is a ( 6 , 4 ) code over Fg. To create 
the first data object xi, cut the digital content into k = 4 
chunks 100, 110, 010, 010, so that xi is written over 
F 8 as xi = (1,1 + w,w,w) where w is the generator 
of Fg, satisfying w 3 = w + 1 . The next version of the 
digital content is created, say ( 10011011001 ). Similarly 
X2 becomes X2 = (1,1 + w, 1 + w, w), and the difference 
vector z 2 is given by z 2 = x 2 — xi = (0,0,1,0), with 
72 = 1 < k/2. Apply a measurement matrix 4 > 72 = Tq to 
compress z 2 : 


4>iz 2 = 
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w + 1 
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Note that every two columns of 4 q are linearly indepen¬ 
dent (see Proposition [lj, thus allowing the compressed 
vector to be recovered. Encode z' 2 using a single parity 
check code: 
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l: procedure Encode(A, Q, E) 

2 : FOR 0 < j < L - 1 

3 : IF j = 0 

4 : return ci = Gxi; 

5 : ELSE (This part summarizes Step j + 1 m f/ze 

text) 

6 : Compute z J+ i = x J+ i — x^; 

7: Compute 7 j + i; 

8: IF 7i+1 > | 

9: return Cj+i = Gzj +1 ; 

10 : ELSE 

11 : Compress z j+% as z ' +1 = $ 7 j - + 1 Zj+i; 

12 : return c i+i = G 7 i+ 1 z i+ i; 

13 : END IF 

14 : END IF 

15 : END FOR 

16 : end procedure 

Fig. 2. Encoding Procedure for DEC 

2.2 Implementation and Placement 

Caching. To store x J+1 for j > 1 , the proposed scheme 
requires the calculation of differences between the exist¬ 
ing version x 7 and the new version x J+1 in (|2|. However, 
it does not store xj, but xi together with z 2 ,..., Zj. 
Reconstructing Xj before computing the difference and 
encoding the new difference is expensive in terms of 
1 /O operations, network bandwidth, latency as well as 
computations. A practical remedy is thus to cache a 
full copy of the latest version x 7 , until a new version 
x J+ i arrives. This also helps in improving the response 
time and overheads of data read operations in general, 
and thus disentangles the system performance from the 
storage efficient resilient storage of all the versions. 

Considering caching as a practical method, an algo¬ 
rithm summarizes the differential erasure coding (DEC) 
procedure in Fig. [2] The input and the output of the 
algorithm are A = {xj E ¥%, 1 < j < L} and 
{c j9 1 <j< L), respectively. 

Placement consideration. The choice of the sets A/}+i, 
j = 0,..., L — 1 of nodes over which the different 
versions are stored needs a closer introspection. Since 
xi together with z 2 ,..., zj are needed to recover x 7 (see 
also Subsection |2.4| ), if xi is lost, x 7 cannot be recovered, 
and thus there is no gain in fault tolerance by storing x 7 
in a different set of nodes than A/}. Furthermore, since 
n 7i < n, codewords c*s may have different resilience 
to failures. The dependency of xj on previous versions 
suggests that the fault-tolerance of subsequent versions 
are determined by the worst fault-tolerance achieved 
among c^s for i < j. 

Example 2: We continue Example [l] where xi is en¬ 
coded into ci = (c ll5 ..., c 16 ) using a (6,4) MDS code. 
Allocate cu to N if that is use the set N\ = ...,iV 6 } of 

nodes. Store c 2 in A /* 2 = {Ah, A 2 , A 3 } c Ah for collocated 
placement, and in A /* 2 = {^{,^ 2 ,^ 3 }, Ah fl J\T\ = 0 for 
distributed placement. Let p be the probability that a 



Fig. 3. Placement consideration: comparing probability 
that both versions are available 

node fails, and failures are assumed independent. We 
compute the probability to recover both xi and x 2 in 
case of node failures (known as static resilience) for both 
distributed and collocated strategies. 

For distributed placement, the set of error events for 
losing xi is £\ = {3 or more nodes fail in A/i}- Hence, 
the probability Prob(fi) of losing Xi is given by 

p 6 + C!p 5 ( 1 - p) + C 4 y (1 - p ) 2 + Clp 3 (1 - p) 3 , (4) 

where C™ denotes the m choose r operation. The set of 
error events for losing z 2 stored with a (3,2) MDS code 
is £2 = {2 or 3 nodes fail in Ah}- Thus, z 2 is lost with 
probability 

Prob(f 2 ) = p 3 +C 3 p 2 (l-p). (5) 

From and the probability of retaining both ver¬ 
sions is 

Prob d (xi,x 2 ) = (1 - Prob(£i))(l - Prob(£: 2 )). ( 6 ) 

The set of error events for losing xi or z 2 is 

fiUf 2 = {3 or more nodes fail}U{specific 2 nodes failure} 

for collocated placement. Out of c 2 6 possible 2 node 
failure patterns, 3 patterns contribute to the loss of the 
object z 2 . Therefore, Prob(£i U£ 2 ) is 

P 6 + C 6 5 p 5 (1 -p) + C 6 4 p\ 1 -p) 2 + Clp 3 (1 -p) 3 + 3p 2 (l -pf 

from which, the probability of retaining both the versions 
is 

Prob c (xi, x 2 ) = 1 - Prob(£i U £ 2 ). (7) 

In Fig. § we compare § and (0 for different values 
of p from 0.001 to 0.05. The plot shows that collocated 
allocation results in better resilience than the distributed 
case. 

Optimized Step j + 1. Based on these insights, a 
practical change of Step j is: if 7 7+ i > |, z 7+1 is 
discarded and x J+i is encoded as c J+ i = Gx J+ i, to 
ensure that a whole version is again encoded. Since 
many contiguous sparse versions may be created, we put 
as a heuristic an iteration threshold i, after which even 
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if all differences from one version to another stay very 
sparse, a whole version is used for coding and storage. 

2.3 On the Storage Overhead 

Since employed erasure codes depend on the sparsity 
level, the storage overhead of the above differential 
encoding improves upon that of encoding different 
versions independently The average gains in storage 
overhead are discussed in Section [5] Formally, the total 
storage size till the l -th version is 

i 

5(xi, x 2 , ...,.X|) = n + ^ min(2/vYj, n) < In , 

3 = 2 

for 2 < l < L. The storage overhead for the Optimized 
Step j + 1 is the same as that of Step j + 1 since for 
7 j+i > f / the coded objects Gx J+ i and Gzj +i have the 
same size. 


2.4 Object Retrieval 

Suppose that L versions of a data object are archived 
using Step j +1, j < L — 1 and the user needs to retrieve 
some x;, 1 < l < L. Assuming that there are enough 
encoded blocks for each c i (i < l ) available, relevant 
nodes in the sets A/},..., Mi are accessed to fetch and 
decode the c z to obtain xi, and the l — 1 compressed dif¬ 
ferences z' 2 , Zg,..., zj. See Subsection | 2 . 2 | for a discussion 
on placement and an illustration that reusing the same 
set of nodes gives the best availability with MDS codes, 
hence bounding the number of accessed nodes by \Mi |. 
All compressed differences sharing the same sparsity can 
be added first, and then decompressed, since 

for J 7 = {j\jj = 7}. The cost of recovering J2ieJ z i i s 
only one decompression instead of | J 7 |, with which x^ 
is given by 

1 

x i = x i + X7- 

i=2 


A minimum of k I/O reads is needed to retrieve xi. 
For z 7 (2 < j <l), the number of I/O reads may be lower 
than k, depending on the update sparsity. If 7 j < |, then 
z' is retrieved with 27 ^ I/O reads, while if 7 j > |, then 
Zj is recovered with k I/O reads, so that min( 27 j, fc) I/O 
reads are needed for z 7 . The total number of I/O reads 
to retrieve x^ is 


1 

r](xi) = k + y^min( 27 i , k) ( 8 ) 

3 =2 

and so is the total number of I/O reads to retrieve the 
first l versions: ^(xi,x 2 ,... ,x/) = ^(x;). 

To retrieve x^ for 1 < l < L, when archival was done 
using Optimized Step 3 + 1 ,j< L— 1, look for the most 
recent version x// such that V < l and 7 // > |. Then, 


using {x;/, z;/ + i,..., zJ, the object x^ is reconstructed as 
x^ = x// + Hence, the total number of I/O 

reads is 

1 

rj(xi) = k+ min ( 2 7 h k )- (9) 

j=i '+1 

The number of I/O reads to retrieve the first l versions 
is the same as for Step j + 1 . 

The benefits of the proposed differential encoding in 
terms of average number of I/O reads are presented in 
Section |5] 

Example 3: Assume that L = 20 versions of 
an object of size k = 10 are differentially en¬ 
coded, with sparsity profile {7 j, 2 < j < L} = 
{3,8,3,6,7,9,10,6,2,2,3,9,3,9,3,10,4,2,3}. The storage pat¬ 
tern is {xi,z 2 ,z 3 ,... ,z 2 o}. Assuming xi is not sparse, 
the I/O read numbers to access {xi,z 2 ,z 3 ,... ,z 2 o} 
are {10,6,10,6,10,10,10,10,10,4,4,6,10, 6 ,10,6,10,8,4, 6 }. 
The total I/O reads to recover all the 20 versions is 
156 (instead of 200 for the non-differential method). The 
total storage space for all the 20 versions assuming a 
storage overhead of 2 is 312 (instead of 400 otherwise). 
The I/O read numbers to recover {xi,x 2 , x 3 ,... ,x 20 } are 
{10,16,26,32,42,52,62,72,82,86,90,96,106,112,122,128,138, 
146,150,156}, while for the optimized step, we get 
{10,16,10, 

16,10,10,10,10,10,14,18,24,10,16,10,16,10,18,22,28}. 

3 Reverse Differential Erasure Cod¬ 
ing 

In Table [lj we summarize the total storage size and 
the number of I/O reads required by the (forward) 
differential method. If some 7 ?, 1 < j < l, are smaller 
than |, then the number of I/O reads for joint retrieval 
of all the versions {xi,x 2 ,... ,xj is lower than that of 
the traditional method. However, this advantage comes 
at the cost of higher number of I/O reads for accessing 
the l- th version x^ alone. Therefore, for applications 
where the latest archived versions are more frequently 
accessed than the joint versions, the overhead for reading 
the latest version dominates the advantage of reading 
multiple versions. For such applications, we propose a 
variant of the differential method called the reverse DEC, 
wherein the order of storing the difference vectors is 
reversed. 

3.1 Object Encoding 

As in Subsection | 2 . 1 | we assume that one node stores 
the latest version Xj and the new version x J+i of a data 
object. Since x 7 is readily obtained, caching is less critical 
here. 

Step j + 1 . Compute the difference vector z J+i = 
x J+ 1 — Xj and its sparsity level 7 j+i. The object x J+ i 
is encoded as c J+ i = Gx J+ i and stored in A/}+i. Fur¬ 
thermore, if 7 j+i < f, then z J+ i is first compressed as 
z ' +1 = 4> 7 j . + 1 z j+ i, and then encoded as c = G 7 i+ 1 z' +1 , 
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TABLE 1 

I/O access metrics for the traditional and the differential schemes to store {xi, x 2 , 


Parameter 

Traditional 

Forward Differential 

Reverse differential 

I/O reads to read the l -th version 

k 

k + X^-=2 min ( 27 j , k ) 

k 

I/O reads to read the first l -th versions 

Ik 

k + E-=2 min (2 7 j, k) 

k + E‘= 2 min ( 2 7 j> fe ) 

Number of Encoding Operations 

1 (on the latest version) 

1 (on the latest version) 

2 (on the latest and the preceding version) 

Total Storage Size till the l -th version 

In 

n + E' ? =2 min( 2 « 7 j, n) 

n + E7=2 min(2« 7j -, n) 


l: procedure Encode^, Q, E) 

2 : FOR 0 < j < L — 1 

3: IF j = 0 

4: return ci = Gxi; 

5: ELSE ( This part summarizes Step j + 1 in the 

text) 

6: Cj + i = Gxj + i; 

7: Compute z J+ i = x J+ i — x^; 

8: Compute 7j+i; 

9: IF J j+1 < | 

10: Compress z J+ i as z' +1 = $ 7i+1 Zj + i; 

11: return Cj = G 7j+1 Zj + i; 

12: END IF 

13: END IF 

14: END FOR 

15: end procedure 

Fig. 4. Encoding Procedure for the Reverse DEC 


where G 7j . +1 is the generator matrix of an (n 7j+1 ,2jj^i) 
erasure code. Finally, the preceding version c 7 is over¬ 
written as Cj = c. 

A key feature is that in addition to encoding the latest 
version x 7 + i, the preceding version is also re-encoded 
depending on the sparsity level 7 j+i, resulting in two 
encoding operations (instead of one for the method in 
Subsection \2.1\ . 

A summary of the encoding is provided in Fig. [5] The 
storage overhead for this method is the same as the one 
in Section [2] The considerations on data placement and 
static resilience of c 7 in the set A fj of nodes are analogous 
as well, and an optimized version is obtained similarly 
as for the forward differential encoding. 


3.2 Object Retrieval 

Suppose that l versions of a data object have been 
archived, and the user needs to retrieve the latest version 
x^. In the reverse DEC, unlike Subsection |2.1| the latest 
version x^ is encoded as Gx^. Hence, the user must 
access a minimum of k nodes from the set Mi to recover 
x/. To retrieve all the l versions {xi, x 2 ,..., x/}, the user 
accesses the nodes in the sets Mi, M2, • • •, Mi to retrieve 
z' 2 , Z 3 ,..., z\, x/, respectively. The objects z 2 , z 3 ,..., z l 
are recovered from z 2 , z ' s ,..., zj, respectively through a 
sparse-reconstruction procedure, and Xj, 1 <3 <1-1, 


is recursively reconstructed as 



It is clear that a total of k + Y^j= 2 m in( 27 j,k) reads are 
needed for accessing all the l versions and only k reads 
for the latest version. The performance metrics of the 
reverse DEC scheme are also summarized in Table [T| (the 
last column). 

Example 4: For the sparsity profile of 

Example |3j the storage pattern using reverse 
DEC is {z 2 ,z 3 ,... , z 20 ,x 20 }. The I/O read 

numbers to access {z 2 , z 3 ,..., z 20 , x 20 } are 
{ 6 , 10 , 6 , 10 , 10 , 10 , 10 , 10 ,4,4, 6 , 10 , 6 , 10 , 6 , 10 , 8 ,4, 6 , 10 }. 
The total storage size and the I/O reads to 
recover all the 20 versions are the same as 
that of the forward differential method. The 
1 /O numbers to recover {xi, x 2 , x 3 ,..., x 20 } are 
{156,150,144,134,124,114,104,94,84,80, 76, 70,60, 54,44, 
38,28,20,16,10}. Note that I/O number to access the 
latest version (in this case 20 th version) is lower 
than that of the forward differential scheme. For the 
optimized step, the corresponding I/O numbers are 
{16,10,16,10,10,10,10,10, 

24, 20,16,10,16,10,16,10, 28, 20,16,10}. 


4 Two-level Differential Erasure Cod¬ 
ing 


The differential encoding (both forward and the reverse 
DEC) exploits the sparse nature of the updates to reduce 
the storage size and the number of I/O reads. Such 
advantages stem from the application of |~|] erasure 
codes matching the different levels of sparsity ([|] — 1 
erasure codes for each 7 < | and one for 7 > |). If 
k is large, then the system needs a large number of 
erasure codes, resulting in an impractical strategy. In this 
section, we employ only two erasure codes, termed two- 
level differential erasure coding , for the sake of easier imple¬ 
mentation, and refer to the earlier differential schemes 
in Sections 2 pT 


as 


|-level DEC schemes. We need the 


following ingredients for the two-level DEC scheme: 

(1) An (n, k) erasure code with generator matrix G e 


F™ x/c to store the original data object. 

(2) A measurement matrix <f> T e F^ Tx/c 
sparse updates, where T e {1,2,..., [|J} is a chosen 
threshold. 


to compress 
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l: procedure Encode(A, G, Gt, 4> t ) 

2: FOR 0 < j < L - 1 

3: IF j = 0 

4: return ci = Gxi; 

5: ELSE ( This part summarizes Step j + 1 m f/ze 

text) 

6: Compute z J+ i = x J+ i — Xj; 

7: Compute 7j+i; 

8: IF 7 i+ i > T 

9: return c J+ i = Gzj +1 ; 

10: ELSE 

11: Compress Zj+i as z' +1 = 4> T Zj+i; 

12: return Cj+% = G t z j+ i; 

13: END IF 

14: END IF 

15: END FOR 

16: end procedure 

Fig. 5. Encoding Procedure for Two-level DEC 


(3) An (n T ,2T) erasure code with generator matrix 
G t E F™ tX2T to store the compressed data object. The 
number n T is chosen such that k = g = . 

We discuss only the two-level forward DEC scheme. 
The two-level reverse DEC scheme is a straightforward 
variation. 


4.1 Object Encoding 

The key point of this encoding is that the number 
of erasure codes (and the corresponding measurement 
matrices) to store the 7 -sparse vectors for 1 < 7 < | is 
reduced from |~|~| — 1 to 1. Thus, based on the sparsity 
level, the update vector is either compressed and then 
archived, or archived as it. Formally: 

Step j - l-l. Once the version x J+ i is created, using x 7 
in the cache, the difference vector z J+ i = x J+ i — Xj and 
the corresponding sparsity level 7 J+ i are computed. If 
7 J+ i > T, the object z J+i is encoded as c= Gz J+ i, 
else Zj + i is first compressed (see ( 13 } ) as z ' +1 = 4> T Zj+i, 
where the measurement matrix 4 >t E F 2Tx/c is such 
that any 2 T of its columns are linearly independent (see 
Proposition[lj. Then, z ^ +1 is encoded as Cj+i = G T z' +1 , 
where Gt E F™ tX2 ^ is the generator matrix of an 
(n T , 2T) erasure code. The components of Cj +i are stored 
across the set A/}+i of nodes. 

A summary of the encoding method is provided in 
Fig. §. 


4.2 On the Storage Overhead 

The total storage size for the two-level DEC is 

<S(xi,x 2 ,... ,x z ) = ra + Ylj=2 n ji where 


f n, if 7j > T 

| ^ 2 T, otherwise. 


( 10 ) 


4.3 Data Retrieval 

Similarly to the |-level DEC scheme, the object x^ for 
some 1 < / < L is reconstructed as x^ = xi +^- =2 Zj, by 
accessing the nodes in the sets A/i, A/ 2 , ■ • •, A/J. To retrieve 
xi, a minimum of k I/O reads is needed. If z 7 is 7 ^-sparse 
and 7 ^ < T, then z' is first retrieved with 2T I/O reads, 
second, zj is decoded from z'■ and ^>t through a sparse- 
reconstruction procedure. On the other hand, if 7j > r, 
then Zj is recovered with k I/O reads. Overall, the total 
number of I/O reads for x^ in the differential set up is 
v(*i) = k + EU Vj, Where 


_ 2T, if < T 

' k, otherwise. 


( 11 ) 


Similarly, the total number of I/O reads to retrieve the 
first l versions is also ^(xi,..., xj) = k + Y^j =2 %• 
Example 5: We apply the threshold T = 3 to the 
sparsity profile in Example [ 3 ] The object zi 8 (with 
7 is =4) is then archived without compression whereas 
all objects with sparsity lower than or equal to 3 are 
compressed using a 6 x 10 measurement matrix. The 
I/O read numbers to access {xi,z 2 ,z 3 ,... ,z 2 o} are 
{ 10 , 6 , 10 , 6 , 10 , 10 , 10 , 10 , 10 , 6 , 6 , 6 , 10 , 6 , 10 , 6 , 10 , 10 , 6 , 6 }. 
The total number of I/O reads to access all the versions 
is 164 and the corresponding storage size is 328. 
Thus, with just two levels of compression, the storage 
overhead is more than the 5-level DEC scheme but still 
lower than 400. 


4.4 Threshold Design Problem 

For the two-level DEC, the total number of I/O reads 
and the storage size are random variables that are re¬ 
spectively given by rj = k + Y ^=2 'dj ? where rjj is given 

in and S = n + Y ^=2 Ah where rij is given in < fTo] ). 
Note that 77 and S are also dependent on the threshold 
T. The threshold T that minimizes the average values of 
ij and 5 is given by: 

T ovt = ar g min wE[<S(xi, x 2 )] + (l— w)E[77(xi, x 2 )], 

( 12 ) 

where 0 < w < 1 is a parameter that appropriately 
weighs the importance of storage overhead and I/O 
reads overhead, and E[-] is the expectation operator over 
the random variables {T 2 , r 3 ,..., Tt}. This optimization 
depends on the underlying probability mass functions 
(PMFs) on {Tj}, so we discuss the choice of the param¬ 
eter 1 < T < LfJ in Section [ 5 ] 

4.5 Cauchy Matrices for Two-level DEC 

Suppose that 4 >t E F 2Tx/c is carved from a Cauchy 
matrix [19]. A Cauchy matrix is such that any square 
submatrix is full rank ED. Thus, there exists a 27 J - x k 
submatrix ^ T {X 2lj ,:) of 4> T , where X 2lj C { 1 , 2 ,..., 2 T} 
represents the indices of 27 j rows, for which any 27 j 
columns are linearly independent, implying that the 
observations r = 4>t(X 27j . , :)zj, can be retrieved from 
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Fig. 6. From top left, clock-wise: Binomial type PMF in p 
(for k = 20), Truncated exponential PMF in a (for k = 10), 
Truncated Poisson PMF in A (for k = 12) and the uniform 
PMF for different object lengths k. The x-axis of these 
plots represent the support {1,2,...,&} of the random 
variable r. 


Afj with 2 y j I/O reads. Also, using r and 4^(7^.,:), 
the sparse update z j can be decoded through a sparse- 
reconstruction procedure. Thus, the number of I/O reads 
to get z j is reduced from 2 T to 27 j when 7 j < T. This 
procedure is applicable for any 7 j < T. Therefore, a 
7 j-sparse vector with 7 j < T can be recovered with 
27 j I/O reads. The total number of I/O reads for 
in the two-level DEC with Cauchy matrix is finally 
77 (x;) = k + E /2 Vj, where 


r 2if 7j- < T 
\ &, otherwise. 


(13) 


Since the number of I/O reads is potentially different 
compared to the case without Cauchy matrices, the 
threshold design problem in ( fl 2 j > can result in different 
answers for this case. We discuss this optimization prob¬ 
lem in Section |U 

Example 6: With Cauchy matrix for 4 >t in Example 
[5j the I/O numbers to access {z 2 , z 3 ,..., z 2 o, x 2 o} are 
{10,6,10,6,10,10,10,10,10,4,4,6,10, 6 ,10,6,10,10,4, 6 }, 
which makes the total I/O reads 158. However, the total 
storage size with Cauchy matrix continues to be 328. 


5 Simulation Results 

In this section, we present experimental results on the 
storage size and the number of I/O reads for the dif¬ 
ferent differential encoding schemes. We assume that 
{Tj, 2 < j < L} is a set of random variables and its 
realizations { 7 7 -, 2 < j < L} are known. First we consider 
a version-control system with L = 2, which is the worst- 
case choice of L as more versions could reveal more 
storage savings. This setting both (i) serves as a proof 
of concept, and (ii) already shows the storage savings 


Fig. 7. Average percentage reduction in the I/O reads 
and storage size for PMFs in Fig. [6] when L = 2 . The 
experimental results are presented in the same order as 
that of the PMFs in Fig. [6] 


for this simple case. Later, we also present experimental 
results for a setup with L > 2 versions. 


5.1 System with L = 2 versions 

For L = 2 , there is one random variable denoted hence¬ 
forth as T, with realization 7. Since T is a discrete random 
variable with finite support, we test the following finite 
support distributions for our experimental results on the 
average number of I/O reads for the two versions and 
the average storage size. 

Binomial type PMF: This is a variation of the standard 
Binomial distribution given by 

u 

PrM = c W^y r " {l ~ p) ' 7 = 1 ' 2 " "' t ' (14) 


where c = 1 _^_ p > )k is the normalizing constant. The 
change is necessary since 7 = 0 is not a valid event. 

Truncated exponential PMF: This is a finite support 
version of the exponential distribution in parameter a > 
0 : 

P r ( 7 ) = ce-^. (15) 


The constant c is chosen such that Y^=i Pr( 7 ) = 1- 
Truncated Poisson PMF: This is a finite support ver¬ 
sion of the Poisson distribution in parameter A given by 


A 7 e -A 

Pr(l) = c -j—i ( 16 ) 

7! 

where the constant c is chosen such that ^ 7=1 Pr( 7 ) = 1 
Uniform PMF: This is the standard uniform distribu¬ 
tion: 

Pr(7) = -j-- ( 17 > 

In Fig. | 6 | we plot the PMFs in f[4} , (15| ), ( p~ 6 ] > and ( p 7\ 
for various parameters. These PMFs are chosen to repre¬ 
sent a wide range of real-world data update scenarios, in 
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the absence of any standard benchmarking dataset (see 
El)- The truncated exponential PMFs generate thick 
concentration for lower sparsity levels, yielding best 
cases for the differential encodings. The uniform distri¬ 
butions illustrate the benefits of the proposed methods 
for update patterns with no bias on sparse values. The 
Binomial distributions provide narrow and bell shaped 
mass functions concentrated around different sparsity 
levels. The Poisson PMFs model sparse updates spread 
over the entire support and concentrated around the 
center. 

For a given PMF Pr( 7 )/ the average storage size for 
storing the first two versions is E[5(xi,x 2 )] = n + 
Y^f=i Pr( 7 )min( 27 ft, n ) where n = nk. Similarly, the 
average number of I/O reads to access the first two ver¬ 
sions is E[t 7 (xi,x 2 )] = k + ^ 7=1 Pr( 7 )min( 27 , k). When 
compared to the non-differential method, the average 
percentage reduction in the I/O reads and the average 
percentage reduction in the storage size are respectively 
computed as 


2fc -E[77(xi,x 2 )] 
2k 


x 100 and 


2 n - E[<J(xi,x 2 )] 
2 n 


x 100 . (18) 


Since 5(xi,x 2 ) = ^(xi,x 2 ) and k is a constant, the 
numbers in ( [T 8 ] > are identical. In Fig. [7j we plot the 
percentage reduction in the above quantities for the 
PMFs displayed in Fig. [ 6 ] The plots show a significant 
reduction in the I/O reads (and the storage size) when 
the distributions are skewed towards smaller 7 . How¬ 
ever, as expected, the reduction is marginal otherwise. 
For uniform distribution on T, the plot shows that the 
advantage with the differential technique saturates for 
large values of k. 

We have discussed how the differential technique 
reduces the storage space at the cost of increased 
number of I/O reads for the latest version (here the 
2 nd version) when compared to the non-differential 
method. For the basic differential encoding, the average 
number of I/O reads to retrieve the 2nd version is 
E[t?(x 2 )] = E[77(xi,x 2 )]. However, for the optimized en- 
coding, E[jj(x 2 )] = £)* =1 PrM/M where /( 7 ) =k + 2j 
when 7 < |, and /( 7) = k, otherwise. When compared 
to the non-differential method, we compute the average 
percentage increase in the I/O reads for retrieving the 
2 nd version for both the basic and the optimized meth¬ 
ods. Numbers for 


E1 ’ ,( *? >1 - k x 100 , (19) 

k 

are shown in Fig. [ 8 j which shows that the optimized 
method reduces the excess number of I/O reads for the 
2 nd version. 


5.2 Two-Level DEC: Threshold Design for L = 2 
Versions 

We now present simulation results to choose the thresh¬ 
old parameter 1 < T < [|J for the two-level DEC 






Fig. 8. Average percentage increase in the I/O reads to 
retrieve the 2nd version for the PMFs (in the same order) 
in Fig. |6] when L = 2 . The corresponding values of n and 
k are same as that of Fig. |7] 


TABLE 2 

Optimal threshold value for various PMFs with k = 10. 


Binomial: k = 20 , for ^-level: 77 = 40 and S = 80 


p 

T opt 

£[ 77 ] 

( 2 -level) 

E[5j 

( 2 -level) 

E[r/] 

(|-level) 

m 

(|-level) 

0.1 

3 

28.11 

56.23 

24.55 

49.10 

0.3 

6 

35.13 

70.27 

31.96 

63.92 

0.5 

8 

38.99 

77.98 

38.23 

76.47 

0.7 

9 

39.96 

79.93 

39.95 

79.90 


Truncated Exponential: k = 10, for |-level: 77 = 20 and S = 40 


a 

T opt 

EM 

( 2 -level) 

E[5j 

( 2 -level) 

EM 

-level) 

E[5] 

(|-level) 

1.6 

1 

13.61 

27.23 

12.50 

25.01 

1.1 

1 

14.66 

29.32 

12.98 

25.97 

0.6 

2 

15.79 

31.59 

14.19 

28.39 

0.1 

2 

18.27 

36.55 

17.26 

34.52 


Truncated Poisson: k = 12 , for ^-level: 77 = 24 and S = 48 


A 

T opt 

EM 

( 2 -level) 

E[5j 

( 2 -level) 

EM 

(|-level) 

E[5j 

(|-level) 

1 

2 

17.01 

34.03 

15.16 

30.32 

3 

3 

20.22 

40.45 

18.20 

36.41 

5 

4 

22.24 

44.49 

21.06 

42.13 

7 

4 

23.29 

46.58 

22.79 

45.58 


scheme in Section 4.4 The optimization problem is given 
in © where 

E[t?(xi,x 2 )] = /c + P r (7 < T)2T + P r (7 > T)k, 


E[5(xi,x 2 )] = /dE[ 77 (xi,x 2 )] and 0 < w < 1 . Since 
E[5(xi,x 2 )] and E[^(xi,x 2 )] are proportional, solving 
( fl 2 ] > is equivalent to solving instead 

T opt = argmm 1 < T < L | J E[(J(xi,x 2 )]. (20) 

In Table |2J we list the values of ^opt/ obtained via 
exhaustive search over 1 < T < |_|J, the average number 
of I/O reads, the average storage size for the optimized 
two-level DEC scheme and the |-level DEC scheme. We 
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Fig. 9. Average storage size E[5 (xi,x 2 )] versus average 
number of I/O reads E[^(xi,x 2 )] 5 1 < T < |_§J = 5 with 
truncated exponential distribution. For each curve, points 
from left to right tip correspond to T = { 1 ,..., [|J = 5}. 


denote E[^(xi,x 2 )] and E[5(xi,x 2 )] by E [rj] and E[<5], 

respectively. To compute the average storage size, we 
use k, = 2. We see that switching to just two levels of 
compression incurs negligible loss in the I/O reads (or 
storage size) when compared to the |-level DEC scheme. 
Thus the two-level DEC scheme is a practical solution 
to reap the benefits of the differential erasure coding 
strategy 

When Cauchy matrices are used for T>t, has to be 
solved for both 

T 

E[t?(xi,x 2 )] = k + P r (7 < 7)27 + Pr(7 > T)k 

7=1 

E[5(xi,x 2 )] = n + P r (7 < T)2Tk + Pr (7 > T)kn. 

Unlike the non-Cauchy case, E[^(xi, x 2 )] and E[<S(xi, x 2 )] 
are no more proportional and T 0 pt depends on w, 0 < 
w < 1. 

To capture the dependency on w, we study the relation 
between E [77 (xi,x 2 )] and E[$(xi, x 2 )] for 1 < T < |_§J. In 
Fig. [ 9 } we plot {(E[5 (xi,x 2 )],E[t7(xi,x 2 )]),1 < T < §} 
for the exponential PMFs from Section |5.1| For each 
curve there are | = 5 points corresponding to T e 
{1,2,...,5} in that sequence from left tip to the right 
one. The plots indicate the value of T 0 pt(w) for the two 
extreme values of w, i.e., w = 0 and w = 1. We further 
study the curve corresponding to a = 0.6. If minimizing 
E[t7(xi,x 2 )] is most important with no constraint on 
E[J(xi,x 2 )] (i.e., w = 1), then choose T 0 p t (l) = |. This 
option results in E[t7(xi,x 2 )] which is as low as for the 
|-level DEC scheme. While if minimizing E[5(xi,x 2 )] 
is most important with no constraint on E[t7(xi,x 2 )] 
(i.e., w = 0), then T O p t (0) = 2 results in E[J(xi,x 2 )] 
which is the same as for the 2-level DEC scheme with 
non-Cauchy matrix. For other values of w, the optimal 
value depends on whether w > 0.5. It can be found via 
exhaustive search over 1<T< LfJ. In summary, using 
Cauchy matrix for reduces the average number of 
1/O reads to that of the | -level DEC with just two levels 
of compression. 




Fig. 10. Average percentage reduction in the I/O reads 
and total storage size for PMFs in Fig. [6] when L = 10. 
The experimental results are presented in the same order 
as that of the PMFs in Fig. [6] Identical PMFs are used for 
the random variable {F^, 2 < j < 10} to obtain the results. 


5.3 Experimental Results for L > 2 

We present the average reduction in the total storage 
size for a differential system with L = 10, assuming 
identical PMFs on the sparsity levels for every version, 
i.e., Pp J .(7 j) = Pr(7) for each 2 < j < 10. The average 
percentage reduction in the total storage size and total 
I/O reads number are computed similarly to ( [T8] >, and 
are illustrated in Fig. 10 The plots show further increase 
in storage savings compared to L = 2 case. In reality, 
the PMFs across different versions may be different and 
possibly correlated. These results are thus only indica¬ 
tive of the saving magnitude for storing many versions 
differentially. 

To get better insights for L > 2, in Fig. 11 we plot 
the I/O numbers of Example [3] and [3] for L = 20. More 
than 20% storage space is saved with respect to the non¬ 
differential scheme, for only slightly higher I/O for the 
optimized DEC. 


6 Practical Differential Erasure Cod¬ 
ing 

So far, we developed a theoretical framework for the 
DEC scheme under a fixed object length assumption across 
successive versions of the data object (see ©• This 
assumption typically does not hold in practice because of 
insertions and deletions, which impact the length of the 
updated object. In this section, we explain how to control 
zero pads in the file structure so as to support insertions 
and deletions in a file, while marginally impacting the 
storage overheads: the variable object size DEC brings a 
gain of 20% to 60% in storage overhead with respect to 
naive techniques. 

To exemplify the use of zero pads, consider storing a 
digital object of size 3781 units through a (12,8) erasure 
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Fig. 11. I/O and storage for Examples |3p] The left plots 
provide the number of I/O reads to retrieve only the l -th 
version for 1 < l < 20 . The right plots show the total 
storage size till the Z-th version for 1 < l < 20 . Results are 
for forward and reverse differential methods, with basic 
and optimized encoding. 


Data Object 


3781 units 



Fig. 12. File structure with different placements of zero 
pads (ZP) - (i) ZP End where the zero pads are concen¬ 
trated at the end (middle figure), and (ii) ZP Intermediate 
where the zero pads are distributed across the file (bottom 
figure). 


chunks {Ci, C 2 ,..., Cm}, where [•] denotes the ceiling 
operator. The zero pads added at the end of every 
chunk promote sparsity in the difference between two 
successive versions. 


code of symbol size 500 units, as shown in Fig. 12 Since 
the object is encoded blockwise, 219 zero pads are added 
to extend the object size to 4000 units. The zero pads 
naturally absorb insertions made anywhere in the file, 
as long as the total size is less than 219 units, thus 
retaining the length of the updated version to 4000 units. 
However, since the zero pads are placed at the end, 
insertions made at the beginning of the file propagate 
changes across the rest of the file. The difference object is 
thus unlikely to exhibit sparsity. Alternatively, one could 
distribute zero pads across the file at different places as 
shown in Fig. [12] Here 160 zero pads are distributed at 
8 patches with each patch containing 20 zero pads. This 
strategy arrests propagation of changes when (small size) 
insertions are made either at the beginning or middle of 
the file. 

Despite zero padding looking like a natural way to 
handle insertions, it is already clear from this example 
that the optimization of the size and placements of zero 
pads is not immediate. We defer this analysis to Section 
[7j and firstly emphasize the functioning of the variable 
size DEC scheme. 


6.1 DEC Step 1 for Variable Size Length Object 

Let T\ be the first version of a file of size V units. The 
system distributes the file contents into several chunks , 
each of size A units. Within each chunk, 5 < A units of 
zero pads are allocated at the end while the rest of it are 
dedicated for the file content. Thus, the V units of the 
file are spread across 


Once the file contents are divided into M chunks, they 
are stored across different servers, using an (n, k) erasure 
code: the code is applied on a block of k data chunks to 
output n(> k) chunks which includes the data chunks 
and n — k encoded chunks that are generated to provide 
fault tolerance against potential failures. The parameter 
k is optimized for the architecture with respect to M, 
which is file dependent: 

Case 1: When M < k, additional M — k chunks 
containing zeros are appended to create a block of k 
chunks. Henceforth, these additional chunks are referred 
to as zero chunks. Then, the k chunks are encoded using 
an (n, k) erasure code. 

Case 2 : When M > k, the M chunks are divided into 
G = [“] groups 0 i ,£/ 2 , ••• , 0 g - The last group Q G if 
found short of k chunks is appended with zero-chunks. 
The k chunks in each group are encoded using an (n, k) 
erasure code. 

For the first version T\, the G groups of chunks 
together have SM + N A units of zero pads, where 1 < 
N < k, represents the number of zero-chunks added to 
make Qq contain k chunks. In addition, the M-th chunk 
may have extra padding due to the rounding operation 
in ( [21) . The SM units of zero pads that are distributed 
across the chunks shield propagation of changes across 
chunks when an insertion is made in subsequent file 
versions. This object can now withstand a total of SM 
units of insertion (anywhere in the file if SM < N A) by 
retaining G groups for the second version. 


V 


We next discuss the use of zero pads while storing the 
(21) (j i)_th version Tj+i of the file, j > 1. 


M = 


A - S 
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6.2 DEC Step j + 1 under Insertions 

For the (j + l)-th version, the DEC system is designed 
to identify the difference in the file content size in every 
chunk. Then the changes in the file contents are carefully 
updated in the chunks, in the increasing order of the 
indices 1,2,...,M, so as to minimize the number of 
chunks modified due to changes in one chunk. For 
1 < i < M, if the content of Ci grows in size by at most 
5 units, then some zero pads are removed to make space 
for the expansion. This C % will have fewer zero pads than 
the first version. On the other hand, if the content of Ci 
grows in size by more than S units, then the first A units 
of the file content are written to C t while the remaining 
units are shifted to C*+ 1 . The existing content of Q+i 
is in turn shifted, and hence, it will have fewer zero 
pads than S. The propagation of changes in the chunks 
continue until all the changes in the file are reflected. 

6.3 DEC Step j + 1 under Deletions 

We saw that placing zero pads stops propagation of 
changes across chunks when inserting new contents. 
When file contents are deleted, the zero pads continue 
to block propagation, this time in the reverse direction. 
Since deletion results in reduced size of the file contents 
in chunks, this is equivalent to having additional zero 
pads (of the same size as that of the deleted patch) in 
the chunks along with the existing zero pads. After this 
process, the metadata should reflect the total size of the 
file contents (potentially less than A — S) in the modified 
chunk. Thus, deletion of file contents boosts the capacity 
of the data structure to shield larger insertions in the next 
versions. 

6.4 Encoding Difference Objects 

The preceding sections emphasized the need for shifting 
the file contents across the chunks when the insertion 
size is more than 5 units. Digging further, if the insertion 
size is large enough, then new chunks (or even new 
groups) have to be added to the existing chunks (or 
groups), thus changing the object size of the (j + 1 )- 
th version. Note that the differential encoding strategy 
requires two successive versions to have the same object 
size to compute the difference. In particular, we adopt 
the reverse DEC wherein the latest version of the object 
is stored in full while the preceding versions are stored 
in a differential manner. Once the contents of the (j + 1 )- 
th version is updated to the chunks, we compute the 
difference between the chunks of the j-th and the (j + 1 )- 
th version. Then we declare a difference chunk to be non¬ 
zero if it contains at least one non-zero element. Within a 
group, if the number of non-zero chunks, say 7 of them, 
is smaller than | then the difference object is compressed 
to contain 2 y chunks, before encoding them using a |- 
level DEC scheme discussed in Section [ 2 ] We continue 
this procedure of storing the difference objects until the 
modified object size is at most kG chunks. Note that one 


could also use the two-level DEC scheme in Section 0] as 
an alternate option to store the difference objects with 
just two erasure codes. 

A set of consecutive versions of the file that maintains 
the same number of groups is referred to as a batch of 
versions, while the number of such versions within the 
batch is called the depth of the batch. The case when 
insertions change the group size is addressed next as a 
source for resetting the differential encoding strategy. 

6.5 Criteria to Reset SEC 

Criterion 1: Starting from the second version, the pro¬ 
cess of storing the difference objects continues until G 
remains constant. When the changes require more than 
G groups, i.e., the updates require more than kG chunks, 
the system terminates the current batch, and then stores 
the object in full by redistributing the file contents into a 
new set of chunks. To illustrate this, let the j-th version of 
the file (for some j > 1 ) be distributed across Mj chunks, 
where < G. Now, let the changes made to the 

(j + l)-th version occupy Mj +1 chunks where [ 

At this juncture, we reorganize the file contents across 
several chunks with S units for zero pads (as done 
for the first version). After re-initialization, this file has 
G' = r^ri groups. 

Criterion 2: Another criterion to reset is when the 
number of non-zero chunks is at least | within every 
group. Due to insufficient sparsity in each group, there 
would be no saving in storage size in this case, and as a 
result, a new batch has to be started. However, a key 
difference from criterion 1 is that the contents of the 
chunks are not reorganized since the group size has not 
changed. 

7 Experiments with Practical DEC 

We conduct experiments with several synthetic work¬ 
loads, capturing wide spectrum of realistic loads to 
demonstrate the efficacy of our scheme. The main ob¬ 
jectives are 

1 ) to determine the right strategy to place the zero 
pads in order to promote sufficient sparsity in the 
difference object for different classes of workloads 
(see Sections [7TT| and [7!2|. 

2 ) to compare the storage savings of DEC against 
two baselines, namely (i) a system setup using 
concepts from Rsync, which is fundamentally a 
delta encoding technique for file transfer and file 
synchronization systems, and (ii) a naive technique 
where each version is fully coded and treated 
as distinct objects, referred to as non-differential 
scheme (see Section [73). 

Throughout this section, we use the reverse differential 
method where the order of storing the difference vectors 
is reversed as {z 2 , z 3 ,..., z^, x^}, as it facilitates direct 
access to the latest version of the object. Also, DEC 
scheme refers not to the primitive form discussed in 
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Section |2j but instead it refers to its variant which was 
discussed in Section [6j Unless specified otherwise, we 
showcase only the best case storage benefits that come 
with the application of |-level DEC scheme, wherein the 
| erasure codes are assumed to have identical storage 
overhead of k = 2. 

For the DEC scheme storing two versions, i.e., L = 2, 
the average storage size for the second version is given 
by 

E[5(z 2 )] = ftE[min( 27 j, k)], (22) 

which is the average size of the data object after erasure 
coding. Since the storage overhead n is held constant for 
all the | erasure codes, we note that the quantity 

E ^ Z2 ^ = E[min( 27 j: k)\, (23) 

which is the average storage size prior to erasure coding, 
is a sufficient statistic to evaluate the placement of zero 
pads. Henceforth, we use (23) as the yardstick in our 
analysis. However, in general, when storage overheads 
are different, E[5(z 2 )] in (22) is a relevant metric for the 
analysis. 

Notice that unlike the quantities in Section [5j the 
quantity in (23) includes raw data as well as zero pads. 
This difference is attributed to a more realistic model 
of erasure coded versioning system in Section [6j where 
the zero pads facilitate block encoding of arbitrary sized 
data objects in addition to shielding the rippling effect 
from insertions and deletions. 


V = 3781, A = 500, 6= 20, k = 8 



V = 3781, A = 500, S = 20, k = 8 



V = 3781, A = 200, S = 8, k = 20 



Parameter D 


V = 3781, A = 200, S = 8, k = 20 



Fig. 13. Comparing different placements of zero pads 
against insertions: Average storage size (as given in (23) ) 
for the 2nd version against workloads comprising random 
insertions. For the top figures, workloads are bursty in¬ 
sertions whose size is uniformly distributed in the interval 
[1, D] for D e {5,10,30,60}. For the bottom figures, 
workloads are several single unit insertions whose quan¬ 
tity is distributed uniformly in the interval [1 ,P], where 
P e {5,10,30,60}. 


7.1 Comparing Different Placements of Zero Pads 

We conduct several experiments to compare the storage 
savings from the zero pads placements highlighted in 
Fig. 12 The parameters for the experiment are V = 3781, 
A = 500, £ = 20 and k = 8. The two schemes under 
comparison are ZP End and ZP Intermediate (discussed 
in Fig. [12), where the zero pads are allocated at the 
end and at intermediate positions, respectively. Like ZP 
Intermediate scheme, the ZP End scheme also contains 
k = 8 chunks (each of size A), however in this case, 219 
zero pads appear at the end in the 8-th chunk. In general, 
appending zero pads at the end of the data object is a 
necessity to employ erasure codes of fixed block length. 
Thus, for the parameters of our experiment, both the 
ZP End and ZP Intermediate schemes initially have equal 
number of zero pads (but at different positions), and 
hence, the comparison is fair. 

From our experiments, we compute the average num¬ 
bers in (23) when two classes of random insertions are 
made to the first version, namely: (i) single bursty inser¬ 
tion whose size is uniformly distributed in the interval 
[1, D], for D = 5,10,30,60, and (ii) several single unit 
insertions uniformly distributed across the object, where 
the number of insertions is uniformly distributed in the 
interval [1 ,P], where P = 5,10,30,60. We repeat the 
experiments 1000 times by generating random insertions 


V = 3781, A = 500, 5= 20, k = 8 V = 3781, A = 200, 5= 8, k = 20 



V = 3781, A = 500, 6= 20, k = 8 V = 3781, A = 200, <5= 8, k = 20 




Fig. 14. Comparing different placements of zero pads 
against deletions: Average storage size (as given in (23}) 
for the second version against workloads comprising ran¬ 
dom deletions. For the top figures, workloads are single 
bursty deletions whose size is uniformly distributed in 
the interval [1,.E] for E e {60,200,600}. For the bottom 
figures, workloads are several single unit deletions whose 
quantity is distributed uniformly in the interval [1 ,Q\, 
where Q e {5,10,30}. 
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Writing data objects bitwise 



:: 

5 






Extra zero 
pads 











7 bits 


480 + 20 bytes 


59 bytes 


4000 bytes 

Reading data objects bitwise after striping 



480 + 20 bytes 59 bytes 


Fig. 15. Bit striping method to generate striped chunks. 
Top figure depicts bit-level writing of data into the chunks. 
Bottom figure depicts bit-level reading of data. This tech¬ 
nique is suitable for uniformly distributed sparse inser¬ 
tions. 


V = 3781, k = 8, A = 500, 8 = 20 



Fig. 16. Comparison of DEC schemes with and without 
bit striping. Average storage size (given in |23}) for the 
second version against workload that has 3 single unit 
insertions with intra-distance uniformly distributed in the 
interval [A - S - R, A - S + R], where R e {40,80,120}. 
For the experiments, we use A = 500 and 5 = 20. 


and then compute the average storage size of the com¬ 
pressed object z 2 (as given in p3}). In Fig. 13 we plot the 
average storage size with the ZP End and ZP Intermediate 
schemes. Similar plots are also presented in Fig. fl3| (on 
the right) with parameters A = 200 ,5 = 20 and k = 20 
for the same object. The plots highlight the advantage of 
distributing the zero pads as it can arrest the propagation 
of changes through intermediate zero pads. We conduct 
more experiments for several classes of random deletions 
and the results are presented in Fig. [14} which highlight 
the savings in storage size for the ZP Intermediate scheme. 


7.2 Chunks with Bit Striping Strategy 

In this section, we analyze the right strategy to synthe¬ 
size chunks for workloads that involve several single 


insertions with sufficient spacing. We first explain the 
motivation for this special case using the following toy 
example. Consider storing a data object of size V = 3871 
units using the parameters A = 500, S = 20 , k = 8 . 
Assume that 3 units of insertions are made to the ob¬ 
ject at the positions 1,481 and 961, which translates to 
modifications of the chunks Ci,C 2 and C 3 , respectively 
Thus, due to just 3 single unit insertions, three chunks 
are modified because of which the difference object after 
compression will be of size 3000 units. Instead, imagine 
striping every chunk into k partitions at the bit level 
such that the S zero pads are equally distributed across 
the partitions (see the top figure in Fig. [15} . Then, create 
a new set of k chunks as follows: create the t-th chunk 
for 1 < t < k by concatenating the contents in the t- 
th partition of all the original chunks (see the bottom 
figure in Fig. [15} . By applying this striping method to the 
toy example, we see that only one chunk (after striping) 
is modified, hence, this strategy would need only 1000 
units for storage after compression. 

For the above example, the insertions are spaced 
exactly at intra-distance A — 5 units to highlight the 
benefits, although in practice, the insertions can as well 
be approximately around that distance to reap the ben¬ 
efits. We conduct experiments by introducing 3 ran¬ 
dom insertions into the file, where the first position is 
chosen at random while the second and the third are 
chosen with intra-distance (with respect to the previous 
insertion) that is uniformly distributed in the interval 
[A — 5 — R, A — 5 + R\ when R G {40,80,120}. For this ex¬ 
periment, the average storage size for the second version 
(i.e., the size of the compressed object z 2 given in ([23}) is 
presented in Fig. [l 6 j which shows significant reduction 
in storage for the striping method when compared to the 
conventional method. Notice that as R increases, there is 
higher chance for the neighboring insertions to not fall 
in the same partition number of different chunks, thus 
diminishing the gains. 

We also test the striping method against two types of 
workloads, namely, the bursty insertion (with parameter 
D G {5,10,30,60}) and the randomly distributed single 
insertions with parameter P G {5,10,30,60}. For the 
workloads with single insertions, the spacing between 
the insertions is uniformly distributed and not neces¬ 
sarily at intra-distance A — S. In Fig. [l7j we present the 
average storage size for the second version (given in ([23}) 
against such workloads. The plots show significant loss 
for the striping method against the former workload (as 
they are not designed for such patterns), whereas the 
storage savings are approximately close to the conven¬ 
tional method against the latter workload. In summary, if 
the insertion pattern is known to be distributed a priori, 
then we advocate the use of the striping method as it 
provides similar performance as that of the conventional 
method with a potential to provide reduced storage 
savings for some special distributed insertions. 
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Fig. 17. Comparing DEC schemes with and without bit 
striping against bursty (the left plot) and randomly dis¬ 
tributed single insertions (the right plot) with parameters 

D,Pe {5,10,30,60}. 




V = 3781, A = 500, 6 = 20, k = 8 



V = 3781, A = 200, <5 = 8, k = 20 



Fig. 18. DEC vs. Rsync with respect to insertions: Aver¬ 
age storage size for the 2nd version against workloads 
comprising random insertions. The parameters D and 
P are as defined for Fig. 


13 


The §- 


level DEC scheme 


applies an erasure code for each sparsity level, whereas 
the two-level DEC applies only two erasure codes based 
on the threshold T opt . The left and the right plots are for 
bursty and distributed single insertions, respectively. 


7.3 Comparing Storage Savings from DEC against 
Rsync based Distributed Storage System 

An important yardstick for comparison is a system 
setup using concepts from Rsync [161, which is a delta 
encoding technique for file transfer and file synchro¬ 
nization systems. In the original Rsync algorithm, only 
the modified chunks between the successive versions 
are transferred across the servers, thereby reducing the 
communication bandwidth. However, with the applica¬ 
tion of Rsync ideas to versioned storage systems, the 
gains in the communication bandwidth gets translated 
to gains in the storage size. In particular, if 7 < | chunks 
are modified, then only those 7 modified chunks are 
stored in the Rsync based scheme by keeping track of 
the indices of the modified chunks, whereas, in contrast. 


27 chunks are stored in the DEC scheme without storing 
the indices of the modified chunks and yet able to 
recover them accurately from the compressed 27 chunks. 
Note that the Rsync based method is effective if the 
updated version retains its size, and has few in-place 
modifications. However, if the object size changes due 
to insertions or deletions, or when changes propagate at 
bit level, then dividing the updated version into fixed 
size chunks need not result in sufficient sparsity in the 
difference object across versions. For the Rsync based 
method, although there are no preallocated zero pads, 
they indirectly appear at the end to generate k (or its 
multiple) number of chunks. 

We conduct more experiments to compare the storage 
savings offered by DEC and Rsync. This time the pa¬ 
rameters of the experiment are V = 3871 A = 500, S = 
20 , k = 8, and the workload includes random insertions 
with the same parameters as that for Fig. [13] Similar to 
the preceding experiments, in this section, the storage 
size of the second version includes raw data and zero 
pads. For the Rsync based method, zero pads appear 
at the end to generate k = 8 number of chunks from 
V = 3871 units of data. Since, for this experiment, 
the total number of zero pads is held constant for the 
two schemes, the comparison is fair. In addition to 
showcasing the savings of DEC, we present in Fig. |18| the 
savings of the two-level DEC scheme where only two 
erasure codes are employed to cater different levels of 
sparsity. For such a case, the threshold T opt is empirically 
computed based on the insertion distribution. The plots 
presented in Fig. [18] highlight the storage savings of 
both the |-level DEC and two-level DEC with respect 
to Rsync, against bursty insertions (with parameter D). 
However, for distributed single insertions (with parame¬ 
ter P), only the |-level DEC outperforms Rsync, but not 
the two-level DEC. 


8 Concluding remarks 

This paper proposes differential erasure coding tech¬ 
niques for improving storage efficiency and I/O reads 
while archiving multiple versions of data. Our evalua¬ 
tions demonstrate tremendous savings in storage. More¬ 
over, in comparison to a system storing every version 
individually, the optimized reverse DEC retains the same 
I/O performance for reading the latest version (which 
is most typical), while reducing significantly the I/O 
overheads when all versions are accessed, in lieu of 
minor deterioration for fetching specific older versions 
(an infrequent event). Future works aim at integrating 
the proposed framework to full-fledged version manage¬ 
ment systems. 
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